Document Clustering Based on Spectral Clustering and Non-negative Matrix Factorization

نویسندگان

  • Lei Bao
  • Sheng Tang
  • Jintao Li
  • Yongdong Zhang
  • Wei-ping Ye
چکیده

In this paper, we propose a novel non-negative matrix factorization (NMF) to the affinity matrix for document clustering, which enforces nonnegativity and orthogonality constraints simultaneously. With the help of orthogonality constraints, this NMF provides a solution to spectral clustering, which inherits the advantages of spectral clustering and presents a much more reasonable clustering interpretation than the previous NMF-based clustering methods. Furthermore, with the help of non-negativity constraints, the proposed method is also superior to traditional eigenvector-based spectral clustering, as it can inherit the benefits of NMF-based methods that the non-negative solution is institutive, from which the final clusters could be directly derived. As a result, the proposed method combines the advantages of spectral clustering and the NMF-based methods together, and hence outperforms both of them, which is demonstrated by experimental results on TDT2 and Reuters-21578 corpus.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Refinement of Document Clustering by Using NMF

In this paper, we use non-negative matrix factorization (NMF) to refine the document clustering results. NMF is a dimensional reduction method and effective for document clustering, because a term-document matrix is high-dimensional and sparse. The initial matrix of the NMF algorithm is regarded as a clustering result, therefore we can use NMF as a refinement method. First we perform min-max cu...

متن کامل

Document Clustering Using Term Weights and Class Label Terms Based on Semantic Features

Clustering of class labels can be generated automatically, which is much lower quality than labels specified by human. In this paper, we propose a new enhancing document clustering method using terms of class label and term weights. The terms of class label can well represent the inherent structure of document clusters by non-negative matrix factorization (NMF). It can also improve the quality ...

متن کامل

A Nonlinear Orthogonal Non-Negative Matrix Factorization Approach to Subspace Clustering

A recent theoretical analysis shows the equivalence between non-negative matrix factorization (NMF) and spectral clustering based approach to subspace clustering. As NMF and many of its variants are essentially linear, we introduce a nonlinear NMF with explicit orthogonality and derive general kernelbased orthogonal multiplicative update rules to solve the subspace clustering problem. In nonlin...

متن کامل

Clinical Document Clustering using Multi-view Non-Negative Matrix Factorization

Clinical document contains vital information like symptom names, medication names, age, gender and some demographical information. These information can be used for giving quick relief from a disease. In existing system, they had built a system for clustering symptom names and medication names using Multi-View Non-Negative Matrix Factorization. While considering the clinical documents the facto...

متن کامل

Parallel Non Negative Matrix Factorization for Document Clustering

Non-negative matrix factorization has been used as an effective approach for document clustering lately. One advantage of this method is that clustering results can be directly concluded from the factor matrices. This project gives parallel implementation of three algorithms for Non-negative matrix factorization. Experiments of these parallel algorithms for large datasets shows good speedup for...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2008